This tutorial follows from Section 2. “Modeling considerations when estimating CDD” of the main text:
Section 2. “Modeling considerations when estimating CDD”
For survival and growth models, defining the density metric and the size of local neighborhoods also require consideration. The density of conspecific and heterospecific neighbors surrounding a focal individual (or plot) can be measured within different distances (i.e., radii). Within a radius, neighbors should be weighted in the density summation with regard to their size and distance to the focal individuals, with the rationale that neighbors at a greater distance and of smaller size have smaller effects. Typically, weighting involves dividing the basal area or diameter by a linear function or an exponential function of distance to allow competitive effects of an individual of a given size to decline with distance (Uriarte et al. 2010; Canham et al. 2006). Biologically meaningful radii should be tested based on the size/life stage, vital rate of interest, and the likely agents of density dependence in the system. For a system of interest, we suggest comparing models with multiple distances (e.g., using maximum likelihood or information criterion) because forests may differ in how neighbors influence performance.
Here, we demonstrate how to calculate the density of conspecific, heterospecific, and all neighbors surrounding a focal individual or plot for a given distance when XY coordinates are known. As a result, this section requires that the user’s dataset contains the location of mapped stems. If locations are not known (as is common in seedling studies or smaller plots), continue to part 2 of this appendix.
We then demonstrate how to weight the calculation of neighborhood density by individual neighbor size (i.e., basal area) and distance using an exponential decay function (one of many possible decay functions), allowing the competitive effects of density values to saturate with increasing distances (Uriarte et al. 2010; Canham et al. 2006).
To assess which shape parameters of the exponential decay function are most appropriate for the data set, we fit models with multiple combinations of decay function values and compare models using log likelihood.
From a computational perspective, this approach can be relatively resource intensive both in terms of time and object size. It’s possible to make this approach more efficient by subdividing the data (e.g., by plot) or using more efficient data structures such as data.table.
We also note that alternative approaches allow the estimation of the effective scale of neighborhood interactions directly from data (see Barber et al. 2022). An excellent case study using the Stan programming language is available here.
# Load librarieslibrary(tidyr) # For data manipulationlibrary(dplyr) # For data manipulationlibrary(ggplot2) # For data plottinglibrary(spatstat.geom) # For spatial analysislibrary(here) # For managing working directorieslibrary(mgcv) # For fitting gamslibrary(lubridate) # For calculating census intervalslibrary(broom) # For processing fit modelslibrary(purrr) # For data manipulationlibrary(kableExtra) # For table styling
2.3 Data format explanation
For this tutorial, we will be using a small example subset of data from Barro Colorado Island (BCI), Panama (available here) that includes 7,028 observations, of 3,771 individuals of 16 tree species across two censuses from the larger 50 ha BCI Forest Dynamics Plot data set (more information here). Each stem is mapped and its diameter at 1.3m above ground is measured (DBH), which allows us to calculate neighborhood densities across different distances, and as a function of neighbor size (e.g., basal area), respectively.
The code below assumes the data is in a format where each row is an observation for an individual from a census. For this data set, the column descriptions are as follows:
treeID: unique identifier for each tree
sp: species code
gx: spatial coordinate on x axis
gy: spatial coordinate on y axis
dbh: diameter at breast height (mm)
ba: basal area (m^2)
status: status at census, A = alive, D = dead
date: date of observation
census: census number
mort: mortality status at census, 1 = dead, 0 = alive
mort_next: mortality status at next census, 1 = dead, 0 = alive
interval: time interval between censuses in years
Let’s take a quick look at the data set we’ll be working with:
Code
head(dat, n =5)
# A tibble: 5 × 12
treeID sp gx gy dbh ba status date mort mort_next
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <date> <dbl> <dbl>
1 3884 ceibpe 296. 24.8 1500 1.77 A 1981-11-10 0 0
2 3998 cordbi 280. 45.4 281 0.0620 A 1981-12-11 0 0
3 4065 ceibpe 289. 266. 2000 3.14 A 1982-01-08 0 0
4 4070 cordbi 283. 287. 356 0.0995 A 1982-01-08 0 0
5 4119 ceibpe 258. 224. 1600 2.01 A 1981-12-13 0 0
# ℹ 2 more variables: interval <dbl>, census <dbl>
We can produce a plot Figure 2.1 of the tree locations, where the size of the point is scaled to basal area and colored by species, with each census displayed as a panel:
Code
ggplot(dat, aes(x = gx, y = gy, size = ba, col = sp)) +geom_point() +facet_wrap(~census) +theme_bw(10) +theme(legend.position ="right") +coord_fixed(ratio =1) +labs(size ="Basal area", col ="Species")
Figure 2.1: Map of tree locations by census. Points are individual stems scaled by basal area, and colors represent the six-letter identifiers of 16 common species at the BCI forest plot.
2.4 Define an exponential decay function
In our neighborhood, we model neighborhood densities using an exponential decay function. In principle, it’s possible to use various different decay functions to explore various ranges that determine the rate of decay.
Here, we present the general framework for how one might explore different scenarios, though the ultimate choice of decay function and parameter values will depend on the study and is beyond the scope of this Appendix. Note that this method of calculating neighborhood density is only possible when neighbors’ x-y coordinates are known.
First, we write a function in R for exponential decay with distance, where mu controls the shape of the curve:
Let’s see what the exponential decay function looks like across a range of mu values (Figure 2.2):
Code
# Set range of mu values and distancesdecay_values <-seq(from =1, to =25, by =2)# Use sprintf to add leading 0s, will help with sorting later ondecay_names =paste("exp", sprintf("%02s", decay_values), sep ="") distances <-seq(1, 100, 1)# Generate a dataframe with each combination of mu and distanceexample_decay_values <-expand_grid(decay_values, distances) %>%# Rename columnsrename(decay_value = decay_values, distance = distances)# Evaluate distance decay function for each combination of # mu and distanceexample_decay_values$decay <-exponential_decay(mu = example_decay_values$decay_value,distance = example_decay_values$distance)# Plot resultsggplot(example_decay_values, aes(x = distance, y = decay, color = decay_value, group = decay_value)) +ylab("Density") +xlab("Distance (m)") +scale_color_continuous(name ="Value of \ndecay constant") +geom_line() +theme_bw(12)
Figure 2.2: Plot of decay function
2.4.1 Determine which trees are at edge of plot
Trees near the edge of plot boundaries have incomplete information about their neighborhood, because neighboring trees outside of the plot boundaries are not mapped, and we do not advise using boundary corrections to include those stems.Typically trees within certain distance threshold from the plot edge are excluded from analysis, but still included in calculations of neighborhood densities.
We are going to add a column to our data set called ‘edge’ that is TRUE if within a set distance to the edge of the plot (e.g., 30 m in the example below).
For our example data set, the dimensions of the plot are 300 x 300 m, ranging from 0-300 on both the x and y axis, and representing a subset of the overall 50 ha forest dynamics plot at BCI.
Code
# Set threshold for distance to edgedistance_threshold_edge =30# Add in min and max values for corners of plotmin_x <-0max_x <-300min_y <-0max_y <-300dat$edge = dat$gx < min_x + distance_threshold_edge | dat$gx > max_x - distance_threshold_edge | dat$gy < min_y + distance_threshold_edge | dat$gy > max_y - distance_threshold_edge# How many trees fall within the edge threshold?table(dat$edge)
FALSE TRUE
4526 2502
Below is a plot Figure 2.3 of tree locations colored by whether they fall within the edge threshold or not, separated out for each census.
Code
ggplot(dat, aes(x = gx, y = gy, col = edge)) +geom_point() +facet_wrap(~census) +coord_fixed(ratio =1) +theme_bw(12)
Figure 2.3: Map of tree locations colored by whether they fall within the edge threshold
2.5 Calculate distances among focal individual and neighbors
Next, we will calculate distances among individuals in our plot, using an upper threshold distance of what to consider a neighbor.
Because this example only includes one census interval (two censuses total), we will subset down to just the first census to calculate neighborhood density.
If the data set were to contain multiple census intervals, it would be necessary to calculate neighborhood density separately for each individual in each census interval, using only the individuals that were alive at the beginning of that census interval, or using an average density value between two consecutive censuses.
Code
# Set distance threshold for considering neighborsdistance_threshold_neighborhood <-30# Subset to first census - this will be different for different # datasetsdat_first_census <- dat %>%filter(census ==1)# Format into 'ppp' objectdat_ppp = spatstat.geom::ppp(dat_first_census$gx, dat_first_census$gy, window =owin(range(dat$gx), range(dat$gy)), checkdup = F)# Determine close pairs based on distance threshold# Returns a list that we convert to tibble laterneighbors = spatstat.geom::closepairs(dat_ppp, rmax = distance_threshold_neighborhood, # Max radiuswhat ="ijd", # return indicies i, j, and distance twice =TRUE)# Convert to dataframeneighbors <-as_tibble(neighbors)# Take a peek at the data# i = index of focal individual# j = index of neighbor# d = distance to neighborhead(neighbors)
Next we add in additional columns for individual characteristic, (e.g., species, size) and whether they are located near the edge of the plot (blue area in Figure 2.3).
Code
# add additional columns# Add species for individual ineighbors$sp_i = dat_first_census$sp[neighbors$i] # Add whether individual i is near edgeneighbors$edge_i = dat_first_census$edge[neighbors$i] # Add species for individual jneighbors$sp_j = dat_first_census$sp[neighbors$j] # Add basal area of individual jneighbors$ba_j = dat_first_census$ba[neighbors$j]
We then want to add a column that indicates whether the comparison between the focal individual and the neighbor is conspecific or heterospecific because we are interested separately estimating the densities of conspecifics and heterospecifics.
We then remove focal trees that are too close to the edge of the plot.
Code
# remove focal trees i that are at the edgeneighbors = neighbors[!neighbors$edge_i, ]
Next, we add columns to our neighbors data set that indicates the distance decay multiplier and the distance decay multiplier weighted by basal area.
Code
# Loop through distance decay valuesfor(x in1:length(decay_values)){# Add column for distance decay multiplier for each decay value# add _ba suffix to column name - will eventually be summed based on# number of individual neighbors neighbors[, paste0(decay_names[x], "_N")] <-exponential_decay(mu = decay_values[x], distance = neighbors$d)# Weight basal area by distance decay multiplier# add _ba suffix to column name neighbors[, paste0(decay_names[x], "_BA")] <-exponential_decay(mu = decay_values[x], distance = neighbors$d) * neighbors$ba_j}
Depending on how many distance decay values are being investigated, there may be many columns in the data frame.
Then we summarize the neighborhood density for each focal tree separately for conspecifics and heterospecifics.
Code
# Simple calculations of number of neighbors and total basal area, # ignoring distance decay multiplierneighbors_summary <- neighbors %>%group_by(i, comparison_type) %>%summarise(nodecay_N =n(), # count of neighborsnodecay_BA =sum(ba_j)) # sum of basal area)# Add in decay columnsneighbors_summary_decay <- neighbors %>%group_by(i, comparison_type) %>%# Select only columns related to distance decayselect(starts_with("exp")) %>%# Summarize them all by summing columnssummarise_all(sum) # Join both togetherneighbors_summary <-left_join(neighbors_summary, neighbors_summary_decay,by =c("i", "comparison_type"))# Add treeID columnneighbors_summary$treeID<-dat_first_census$treeID[neighbors_summary$i]# If there are any focal individuals with no neighbors, add values # of 0 for neighborhood densities noNeighbors = dat_first_census$treeID[!dat_first_census$treeID %in% neighbors_summary$treeID &!dat_first_census$edge]# If there are individuals with no neighborsif (length(noNeighbors) >0) { neighbors_summary =bind_rows(neighbors_summary, expand_grid(i =NA, treeID = noNeighbors, comparison_type =c("het", "cons"))) %>%# Add 0s where NAmutate_all(replace_na, replace =0) }# Take a peak at the datahead(neighbors_summary)
As described in the main text, it can be advantageous to use total density that includes combined conspecific and heterospecific densities as a covariate, rather than heterospecific density.
Here, we calculate total density by summing heterospecific and conspecific densities.
Code
# First convert to long format which will make it easy to sum across # heterospecific and conspecific valuesneighbors_summary_long_format <- neighbors_summary %>%pivot_longer(cols =-c("i", "comparison_type", "treeID"))# Sum across heterospecific and conspecific values and rename to 'total'neighbors_total_long_format <- neighbors_summary_long_format %>%group_by(i, treeID, name) %>%summarize(value =sum(value)) %>%mutate(comparison_type ="total")# Bind together conspecific and 'total' densities# remove heterospecific columns# fill in 0s where there are no neighborsneighbors_summary_total =bind_rows(neighbors_summary_long_format, neighbors_total_long_format) %>%# Can filter out heterospecific neighborhood # values by uncommenting this line of the# objects become too large# filter(comparison_type != "het") %>% mutate(name =paste0(comparison_type, "_", name)) %>%select(-comparison_type) %>%pivot_wider(names_from = name, values_from = value, values_fill =0)
2.7 Model mortality as a function of neighborhood density
To determine the ‘best’ decay parameter to use, we fit species-specific mortality models using Generalized Additive Models GAMs and compare models based on log likelihoods. GAMs are flexible statistical models that combine linear and non-linear components to capture complex relationships in data.
We first create our data set that we will use in the GAMs, subsetting down to just one census interval (because our example dataset only has 2 censuses) and removing trees close to the edge as above. In other datasets, you may have multiple census intervals, where it would be common practice to include ‘year’ or ‘census’ as a random effect in the model, but otherwise the overall approach is similar.
Code
# Join census data with neighborhood datadat_gam <-left_join(dat_first_census, neighbors_summary_total,by ="treeID")# Remove edge trees (these will have NAs for densities calcualations)dat_gam <- dat_gam %>%filter(edge ==FALSE)
For each species, we summarize data availability to help determine parameters for the GAM smooth terms.
Code
# Summarize data availability at species level to set the degree of # smoothness for GAM smooth termssp_summary <- dat_gam %>%group_by(sp) %>%summarise(ndead =sum(mort_next),range_con_BA =max(con_nodecay_BA) -min(con_nodecay_BA),max_con_BA =max(con_nodecay_BA),unique_con_BA =length(unique(con_nodecay_BA)),unique_total_BA =length(unique(total_nodecay_BA)),range_con_N =max(con_nodecay_N) -min(con_nodecay_N),max_con_N =max(con_nodecay_N),unique_con_N =length(unique(con_nodecay_N)),unique_total_N =length(unique(total_nodecay_N)),unique_dbh =length(unique(dbh)) )# Filter out species that have 0% mortalitysp_summary <- sp_summary %>%filter(ndead >0)
In this long block of code, we loop over all possible combinations of decay values for different metrics of neighborhood densities weighted by distance for each species and fit a separate GAM for each model. For each GAM, we assess whether the model was able to be fit, and when it was able to be fit, whether it converged or produced warnings. We save the results of successful model fits into a list that we will process later.
For large datasets where individual GAMs take a long time to run, the code could be modified to run in parallel, either locally on a personal computer or across a computing cluster.
Code
# Initialize list that will save model outputsres_mod <-list()# Model run settingsrun_settings <-expand_grid(species =unique(sp_summary$sp),decay_con =c("nodecay", decay_names),decay_total =c("nodecay", decay_names),nhood_data_type =c("N", "BA"))# Loop through model run settingsfor(run_settings_row in1:nrow(run_settings)){# Extract values from run settings dataframe species <- run_settings$species[run_settings_row] decay_con <- run_settings$decay_con[run_settings_row] decay_total <- run_settings$decay_total[run_settings_row] nhood_data_type <- run_settings$nhood_data_type[run_settings_row]# Subset down to just focal species dat_subset <- dat_gam %>%filter(sp == species)# Set run name run_name <-paste0(species, "_total", decay_total,"_con", decay_con, "_", nhood_data_type)# Print status if desired# cat("Working on run: ", run_name, " ...\n")# Create model formula# Initial DBH included as predictor variable form =paste0("mort_next ~ s(dbh, k = k1) + s(total_", decay_total, "_", nhood_data_type, ", k = k2) + s(con_", decay_con, "_", nhood_data_type, ", k = k3)")# Convert into formula form <-as.formula(form)# Choose penalties for model fitting# set to default 10 (the same as -1)# The higher the value of k, the more flexible the smooth term becomes, allowing for more intricate and wiggly patterns. Conversely, lower values of k result in smoother and simpler representations. k1 = k2 = k3 =10if (k1 > sp_summary$unique_dbh[sp_summary$sp == species]) { k1 = sp_summary$unique_dbh[sp_summary$sp == species] -2 }if (k2 > sp_summary$unique_total_N[sp_summary$sp == species]) { k2 = sp_summary$unique_total_N [sp_summary$sp == species]-2 }if (k3 > sp_summary$unique_con_N[sp_summary$sp == species]) { k3 = sp_summary$unique_con_N[sp_summary$sp == species] -2 }# Fit model# wrap in a try function to catch any errors mod =try(gam(form,family =binomial(link=cloglog),offset =log(interval),data = dat_subset,method ="REML"), silent = T)# Check if model was able to fitif (!any(class(mod) =="gam")) {# print(paste("gam failed for:", run_name)) } else {# Check if gam convergedif (!mod$converged) {# print(paste("no convergence for:", run_name)) } else {# check for complete separation# https://stats.stackexchange.com/questions/336424/issue-with-complete-separation-in-logistic-regression-in-r# Explore warning "glm.fit: fitted probabilities numerically 0 # or 1 occurred" eps <-10* .Machine$double.eps glm0.resids <-augment(x = mod) %>%mutate(p =1/ (1+exp(-.fitted)),warning = p >1-eps,influence =order(.hat, decreasing = T)) infl_limit =round(nrow(glm0.resids)/10, 0)# check if none of the warnings is among the 10% most # influential observations, than it is okay.. num =any(glm0.resids$warning & glm0.resids$influence < infl_limit)# complete separationif (num) {# print(paste("complete separation is likely for:", run_name)) } else {# missing Vcif (is.null(mod$Vc)) {# print(paste("Vc not available for:", run_name)) } else {# Add resulting model to list if it passes all checks res_mod[[run_name]] <- mod } # Vc ifelse } # complete separation ifelse } # convergence ifelse } # model available ifelse} # end run settings loop
2.8 Summarize model fits
Next we will extract summaries for each model with broom::glance() that provides key information like degrees of freedom, log likelihood, AIC, etc.
Code
# Extract summaries for each model into a listsums =lapply(res_mod, broom::glance) # Add a column for model run to each object in the listsums =Map(cbind, sums, run_name =names(sums)) sums =do.call(rbind, sums) # Bind elements of list together by rowsrownames(sums) <-NULL# Remove row names# Separate run name into columns for species, decay, and density typesums <- sums %>%separate(run_name, into =c("sp", "decay_total", "decay_con", "density_type"), remove =FALSE)# Remove 'total' and 'con' from decay columnssums$decay_total <-gsub("total", "", sums$decay_total)sums$decay_con <-gsub("con", "", sums$decay_con)# Rearrange columns sums <- sums %>%select(run_name, sp, decay_total, decay_con, density_type, everything()) # Take a look at the model summarieshead(sums)
Due to limited sample sizes, it is likely that GAMs will not fit for each species. For example, in the table below of summary data for species where models did not successfully fit/converge, there are several instances where all individuals of the species survived during the census, leading to no mortality events to use in the model.
We need to exclude species without complete model runs from our overall calculations when looking for optimal decay parameters across the entire data set.
Code
# Tally up number of model runs by species and total decay valuestable(sums$sp, sums$decay_total)
# get incomplete run-site-species combinationsrun_counts_by_sp <- sums %>%group_by(sp) %>%tally() %>%# Join with overall species listleft_join(sp_summary %>%select(sp), ., by ="sp") # Get expected number of runs if all models workexpected_total_runs <- run_settings %>%group_by(species) %>%tally() %>%pull(n) %>%max()# Save species names where they didn't have all expected combinations # of model runsincomplete = run_counts_by_sp$sp[run_counts_by_sp$n < expected_total_runs |is.na(run_counts_by_sp$n)]# Species with successful runshead(sp_summary[!sp_summary$sp %in% incomplete, ])
2.9 Selecting optimum decay parameter values across all species
We then summarize different model criteria across all species runs. To look for the optimal value for decay parameters, we sum log likelihoods across all species for a given decay parameter combination and choose the resulting parameter combination with the highest summed log likelihood.
One crucial point is that each run of the grid search should be done with the same set of trees, even if smaller neighborhodoes not need a buffer of 30 m.
# A tibble: 392 × 6
# Groups: decay_total, decay_con [196]
decay_total decay_con density_type nvalues sumlogLik meanlogLik
<chr> <chr> <chr> <int> <dbl> <dbl>
1 exp01 exp01 BA 4 -556. -139.
2 exp01 exp01 N 4 -554. -138.
3 exp01 exp03 BA 4 -553. -138.
4 exp01 exp03 N 4 -558. -140.
5 exp01 exp05 BA 4 -554. -138.
6 exp01 exp05 N 4 -561. -140.
7 exp01 exp07 BA 4 -557. -139.
8 exp01 exp07 N 4 -563. -141.
9 exp01 exp09 BA 4 -558. -140.
10 exp01 exp09 N 4 -564. -141.
# ℹ 382 more rows
We then create a heatmap plot Figure 2.4 of summed log likelihoods for all fixed devay (mu) parameter combinations across all species, with the optimal parameter combination marked with an X
Code
# Find optimum value separately for N and BA optimum <- sums_total %>%group_by(density_type) %>%slice_max(sumlogLik)# Plot heatmap of log likelihood valuesggplot(sums_total, aes(x = decay_total, y = decay_con, fill = sumlogLik)) +geom_tile(width =0.9, height =0.9, col ="black") +scale_fill_viridis_c() +# viridis color palettegeom_label(data = optimum, label ="X") +labs(x ="Decay total density", y ="Decay conspecific density", fill ="sumlogLik") +facet_wrap(~density_type, ncol =1) +theme_bw(12) +theme(axis.text.x =element_text(angle =90, vjust =0.5, hjust=1))
Figure 2.4: Heatmap of optimal values for decay constants
For this data set, the following are the optimal decay parameter values across all species separately for neighborhood density calculated by abundance (N) and by basal area (BA)
2.10 Selecting optimum decay parameter values separately for each species
Alternatively, it may be useful to determine optimum decay parameters on a species by species basis. To do this, we find the maximum log liklihood for each species separately across decay parameter combination and choose the resulting parameter combination with the highest log likelihood.
# A tibble: 1,568 × 7
# Groups: decay_total, decay_con, density_type [392]
decay_total decay_con density_type sp nvalues sumlogLik meanlogLik
<chr> <chr> <chr> <chr> <int> <dbl> <dbl>
1 exp01 exp01 BA cappfr 1 -16.6 -16.6
2 exp01 exp01 BA des2pa 1 -489. -489.
3 exp01 exp01 BA micone 1 -43.7 -43.7
4 exp01 exp01 BA stylst 1 -6.84 -6.84
5 exp01 exp01 N cappfr 1 -14.3 -14.3
6 exp01 exp01 N des2pa 1 -489. -489.
7 exp01 exp01 N micone 1 -41.3 -41.3
8 exp01 exp01 N stylst 1 -8.64 -8.64
9 exp01 exp03 BA cappfr 1 -20.7 -20.7
10 exp01 exp03 BA des2pa 1 -485. -485.
# ℹ 1,558 more rows
We create a heatmap plot Figure 2.5 of log likelihoods for all fixed decay parameter combinations for each species separately, with the optimal parameter combination marked with an X. We only display the first three species here, because with data sets containing many species, it’s hard to visualize all the species on one graph and you may wish to subdivide the plot further for visualization.
Code
sums <- sums %>%filter(sp %in%c("cappfr", "cordbi", "des2pa")) %>%group_by(sp) %>%# Scale loglikihood by species to help with visualizationmutate(logLik_scaled =scale(logLik)) %>%ungroup()# Find optimum value separately for N and BA optimum_by_sp <- sums %>%group_by(sp, density_type) %>%slice_max(logLik_scaled, with_ties =FALSE)# Plot heatmap of log likelihood valuesggplot(sums, aes(x = decay_total, y = decay_con,fill = logLik_scaled)) +geom_tile(width =0.9, height =0.9, col ="black") +geom_label(data = optimum_by_sp, label ="X") +labs(x ="Decay total density", y ="Decay conspecific density", fill ="logLik") +facet_wrap(~sp + density_type, ncol =2, scales ="free") +scale_fill_viridis_c() +# viridis color palettetheme_bw(12) +theme(legend.position ="right") +labs(fill ="Log likelihood\n(scaled)") +theme(axis.text.x =element_text(angle =90, vjust =0.5, hjust=1))
Figure 2.5: Heatmap of optimal values for decay constants separately for each species
For this data set, the following are the optimal decay parameter values for each species separately for neighborhood density calculated by abundance (N) and by basal area (BA):
Barber, Cristina, Andrii Zaiats, Cara Applestein, Lisa Rosenthal, and T Trevor Caughlin. 2022. “Bayesian Models for Spatially Explicit Interactions Between Neighbouring Plants.”Methods in Ecology and Evolution.
Canham, Charles D, Michael J Papaik, Marı́a Uriarte, William H McWilliams, Jennifer C Jenkins, and Mark J Twery. 2006. “Neighborhood Analyses of Canopy Tree Competition Along Environmental Gradients in New England Forests.”Ecological Applications 16 (2): 540–54.
Hülsmann, Lisa, Ryan A Chisholm, Liza Comita, Marco D Visser, Melina de Souza Leite, Salomon Aguilar, Kristina J Anderson-Teixeira, et al. 2024. “Latitudinal Patterns in Stabilizing Density Dependence of Forest Communities.”Nature 627 (8004): 564–71.
Uriarte, Marı́a, Nathan G Swenson, Robin L Chazdon, Liza S Comita, W John Kress, David Erickson, Jimena Forero-Montaña, Jess K Zimmerman, and Jill Thompson. 2010. “Trait Similarity, Shared Ancestry and the Structure of Neighbourhood Interactions in a Subtropical Wet Forest: Implications for Community Assembly.”Ecology Letters 13 (12): 1503–14.
---output: html_documentprefer-html: trueeditor_options: chunk_output_type: consoleexecute: cache: true---# Calculating neighborhood densities## OverviewThis tutorial follows from Section 2. "Modeling considerations when estimating CDD" of the main text:::: callout-note## Section 2. "Modeling considerations when estimating CDD"For survival and growth models, defining the density metric and the size of local neighborhoods also require consideration. The density of conspecific and heterospecific neighbors surrounding a focal individual (or plot) can be measured within different distances (i.e., radii). Within a radius, neighbors should be weighted in the density summation with regard to their size and distance to the focal individuals, with the rationale that neighbors at a greater distance and of smaller size have smaller effects. Typically, weighting involves dividing the basal area or diameter by a linear function or an exponential function of distance to allow competitive effects of an individual of a given size to decline with distance [@uriarte2010trait; @canham2006neighborhood]. Biologically meaningful radii should be tested based on the size/life stage, vital rate of interest, and the likely agents of density dependence in the system. For a system of interest, we suggest comparing models with multiple distances (e.g., using maximum likelihood or information criterion) because forests may differ in how neighbors influence performance.:::Here, we demonstrate how to calculate the density of conspecific, heterospecific, and all neighbors surrounding a focal individual or plot for a given distance when XY coordinates are known. As a result, this section requires that the user's dataset contains the location of mapped stems. If locations are not known (as is common in seedling studies or smaller plots), continue to part 2 of this appendix.We then demonstrate how to weight the calculation of neighborhood density by individual neighbor size (*i.e.,* basal area) and distance using an exponential decay function (one of many possible decay functions), allowing the competitive effects of density values to saturate with increasing distances [@uriarte2010trait; @canham2006neighborhood].To assess which shape parameters of the exponential decay function are most appropriate for the data set, we fit models with multiple combinations of decay function values and compare models using log likelihood.From a computational perspective, this approach can be relatively resource intensive both in terms of time and object size. It's possible to make this approach more efficient by subdividing the data (*e.g.,* by plot) or using more efficient data structures such as [data.table](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html).We also note that alternative approaches allow the estimation of the effective scale of neighborhood interactions directly from data [see @barber2022bayesian]. An excellent case study using the Stan programming language is [available here](https://mc-stan.org/users/documentation/case-studies/plantInteractions.html).::: callout-noteThe following code is heavily adapted from the [latitudinalCNDD repository](https://github.com/LisaHuelsmann/latitudinalCNDD) by [Lisa Hülsmann](https://demographicecology.com/) from the publication @hulsmann2024latitudinal.:::## Load libraries and data```{r, message=FALSE}# Load librarieslibrary(tidyr) # For data manipulationlibrary(dplyr) # For data manipulationlibrary(ggplot2) # For data plottinglibrary(spatstat.geom) # For spatial analysislibrary(here) # For managing working directorieslibrary(mgcv) # For fitting gamslibrary(lubridate) # For calculating census intervalslibrary(broom) # For processing fit modelslibrary(purrr) # For data manipulationlibrary(kableExtra) # For table styling```## Data format explanationFor this tutorial, we will be using a small example subset of data from Barro Colorado Island (BCI), Panama ([available here](https://github.com/LisaHuelsmann/latitudinalCNDD/tree/main/data_prep/input/data_tree)) that includes 7,028 observations, of 3,771 individuals of 16 tree species across two censuses from the larger 50 ha BCI Forest Dynamics Plot data set ([more information here](https://forestgeo.si.edu/sites/neotropics/barro-colorado-island)). Each stem is mapped and its diameter at 1.3m above ground is measured (DBH), which allows us to calculate neighborhood densities across different distances, and as a function of neighbor size (*e.g.*, basal area), respectively.The code below assumes the data is in a format where each row is an observation for an individual from a census. For this data set, the column descriptions are as follows:- **treeID**: unique identifier for each tree- **sp**: species code- **gx**: spatial coordinate on x axis- **gy**: spatial coordinate on y axis- **dbh:** diameter at breast height (mm)- **ba:** basal area (m\^2)- **status:** status at census, A = alive, D = dead- **date:** date of observation- **census**: census number- **mort**: mortality status at census, 1 = dead, 0 = alive- **mort_next**: mortality status at next census, 1 = dead, 0 = alive- **interval**: time interval between censuses in years```{r echo=FALSE}# This code is not shown in report# Load in and clean example BCI data # Loads in a list object called 'tree' with separated by censusload(here("./data/site01_tree.Rdata"))# Restructure data for mortality analysisfor (census in1:(length(tree)-1)) { tree[[census]]$surv =ifelse(grepl("A", tree[[census]]$status), 1, 0) # status at t=0 tree[[census]]$surv_next =ifelse(grepl("A", tree[[census+1]]$status), 1, 0) # status at t=1 tree[[census]]$mort =ifelse(grepl("A", tree[[census]]$status), 0, 1) # status at t=0 tree[[census]]$mort_next =ifelse(grepl("A", tree[[census+1]]$status), 0, 1) # status at t=1 tree[[census]]$interval =time_length(difftime(tree[[census+1]]$date , tree[[census]]$date), "years") # length of interval}# Bind together into single tibbledat <-tibble(bind_rows(tree[[1]] %>%mutate(census =1), tree[[2]] %>%mutate(census =2)) %>% dplyr::select(-c(homchange, nostems))) %>%filter(status %in%c("A", "D"))# Remove surv and surv next columnsdat <- dat %>% dplyr::select(-c(surv, surv_next))```Let's take a quick look at the data set we'll be working with:```{r}head(dat, n =5)```------------------------------------------------------------------------We can produce a plot @fig-map of the tree locations, where the size of the point is scaled to basal area and colored by species, with each census displayed as a panel:```{r, fig.width = 8, fig.height=8}#| label: fig-map#| fig-cap: "Map of tree locations by census. Points are individual stems scaled by basal area, and colors represent the six-letter identifiers of 16 common species at the BCI forest plot."ggplot(dat, aes(x = gx, y = gy, size = ba, col = sp)) +geom_point() +facet_wrap(~census) +theme_bw(10) +theme(legend.position ="right") +coord_fixed(ratio =1) +labs(size ="Basal area", col ="Species")```## Define an exponential decay functionIn our neighborhood, we model neighborhood densities using an exponential decay function. In principle, it’s possible to use various different decay functions to explore various ranges that determine the rate of decay.Here, we present the general framework for how one might explore different scenarios, though the ultimate choice of decay function and parameter values will depend on the study and is beyond the scope of this Appendix. **Note that this method of calculating neighborhood density is only possible when neighbors' x-y coordinates are known.**First, we write a function in R for exponential decay with distance, where mu controls the shape of the curve:```{r}exponential_decay <-function(mu, distance){return(exp(-(1/mu * distance)))}```Let's see what the exponential decay function looks like across a range of mu values (@fig-decay):```{r}#| label: fig-decay#| fig-cap: "Plot of decay function"# Set range of mu values and distancesdecay_values <-seq(from =1, to =25, by =2)# Use sprintf to add leading 0s, will help with sorting later ondecay_names =paste("exp", sprintf("%02s", decay_values), sep ="") distances <-seq(1, 100, 1)# Generate a dataframe with each combination of mu and distanceexample_decay_values <-expand_grid(decay_values, distances) %>%# Rename columnsrename(decay_value = decay_values, distance = distances)# Evaluate distance decay function for each combination of # mu and distanceexample_decay_values$decay <-exponential_decay(mu = example_decay_values$decay_value,distance = example_decay_values$distance)# Plot resultsggplot(example_decay_values, aes(x = distance, y = decay, color = decay_value, group = decay_value)) +ylab("Density") +xlab("Distance (m)") +scale_color_continuous(name ="Value of \ndecay constant") +geom_line() +theme_bw(12)```### Determine which trees are at edge of plotTrees near the edge of plot boundaries have incomplete information about their neighborhood, because neighboring trees outside of the plot boundaries are not mapped, and we do not advise using boundary corrections to include those stems.Typically trees within certain distance threshold from the plot edge are excluded from analysis, but still included in calculations of neighborhood densities.We are going to add a column to our data set called 'edge' that is TRUE if within a set distance to the edge of the plot (*e.g.*, 30 m in the example below).For our example data set, the dimensions of the plot are 300 x 300 m, ranging from 0-300 on both the x and y axis, and representing a subset of the overall 50 ha forest dynamics plot at BCI.```{r}# Set threshold for distance to edgedistance_threshold_edge =30# Add in min and max values for corners of plotmin_x <-0max_x <-300min_y <-0max_y <-300dat$edge = dat$gx < min_x + distance_threshold_edge | dat$gx > max_x - distance_threshold_edge | dat$gy < min_y + distance_threshold_edge | dat$gy > max_y - distance_threshold_edge# How many trees fall within the edge threshold?table(dat$edge)```Below is a plot @fig-edge of tree locations colored by whether they fall within the edge threshold or not, separated out for each census.```{r}#| label: fig-edge#| fig-cap: "Map of tree locations colored by whether they fall within the edge threshold"ggplot(dat, aes(x = gx, y = gy, col = edge)) +geom_point() +facet_wrap(~census) +coord_fixed(ratio =1) +theme_bw(12)```## Calculate distances among focal individual and neighborsNext, we will calculate distances among individuals in our plot, using an upper threshold distance of what to consider a neighbor.We use the ['spatstat.geom' package](https://spatstat.org/) to efficiently calculate distances among individuals.Because this example only includes one census interval (two censuses total), we will subset down to just the first census to calculate neighborhood density.If the data set were to contain multiple census intervals, it would be necessary to calculate neighborhood density separately for each individual in each census interval, using only the individuals that were alive at the beginning of that census interval, or using an average density value between two consecutive censuses.```{r}# Set distance threshold for considering neighborsdistance_threshold_neighborhood <-30# Subset to first census - this will be different for different # datasetsdat_first_census <- dat %>%filter(census ==1)# Format into 'ppp' objectdat_ppp = spatstat.geom::ppp(dat_first_census$gx, dat_first_census$gy, window =owin(range(dat$gx), range(dat$gy)), checkdup = F)# Determine close pairs based on distance threshold# Returns a list that we convert to tibble laterneighbors = spatstat.geom::closepairs(dat_ppp, rmax = distance_threshold_neighborhood, # Max radiuswhat ="ijd", # return indicies i, j, and distance twice =TRUE)# Convert to dataframeneighbors <-as_tibble(neighbors)# Take a peek at the data# i = index of focal individual# j = index of neighbor# d = distance to neighborhead(neighbors)```Next we add in additional columns for individual characteristic, (*e.g.,* species, size) and whether they are located near the edge of the plot (blue area in @fig-edge).```{r}# add additional columns# Add species for individual ineighbors$sp_i = dat_first_census$sp[neighbors$i] # Add whether individual i is near edgeneighbors$edge_i = dat_first_census$edge[neighbors$i] # Add species for individual jneighbors$sp_j = dat_first_census$sp[neighbors$j] # Add basal area of individual jneighbors$ba_j = dat_first_census$ba[neighbors$j] ```We then want to add a column that indicates whether the comparison between the focal individual and the neighbor is conspecific or heterospecific because we are interested separately estimating the densities of conspecifics and heterospecifics.```{r}neighbors$comparison_type <-ifelse(neighbors$sp_i == neighbors$sp_j,yes ="con", # conspecificno ="het") # heterospecific```We then remove focal trees that are too close to the edge of the plot.```{r}# remove focal trees i that are at the edgeneighbors = neighbors[!neighbors$edge_i, ]```Next, we add columns to our neighbors data set that indicates the distance decay multiplier and the distance decay multiplier weighted by basal area.```{r}# Loop through distance decay valuesfor(x in1:length(decay_values)){# Add column for distance decay multiplier for each decay value# add _ba suffix to column name - will eventually be summed based on# number of individual neighbors neighbors[, paste0(decay_names[x], "_N")] <-exponential_decay(mu = decay_values[x], distance = neighbors$d)# Weight basal area by distance decay multiplier# add _ba suffix to column name neighbors[, paste0(decay_names[x], "_BA")] <-exponential_decay(mu = decay_values[x], distance = neighbors$d) * neighbors$ba_j}```Depending on how many distance decay values are being investigated, there may be many columns in the data frame.```{r}head(neighbors)```## Calculate neighborhood densityThen we summarize the neighborhood density for each focal tree separately for conspecifics and heterospecifics.```{r, message=FALSE}# Simple calculations of number of neighbors and total basal area, # ignoring distance decay multiplierneighbors_summary <- neighbors %>%group_by(i, comparison_type) %>%summarise(nodecay_N =n(), # count of neighborsnodecay_BA =sum(ba_j)) # sum of basal area)# Add in decay columnsneighbors_summary_decay <- neighbors %>%group_by(i, comparison_type) %>%# Select only columns related to distance decayselect(starts_with("exp")) %>%# Summarize them all by summing columnssummarise_all(sum) # Join both togetherneighbors_summary <-left_join(neighbors_summary, neighbors_summary_decay,by =c("i", "comparison_type"))# Add treeID columnneighbors_summary$treeID<-dat_first_census$treeID[neighbors_summary$i]# If there are any focal individuals with no neighbors, add values # of 0 for neighborhood densities noNeighbors = dat_first_census$treeID[!dat_first_census$treeID %in% neighbors_summary$treeID &!dat_first_census$edge]# If there are individuals with no neighborsif (length(noNeighbors) >0) { neighbors_summary =bind_rows(neighbors_summary, expand_grid(i =NA, treeID = noNeighbors, comparison_type =c("het", "cons"))) %>%# Add 0s where NAmutate_all(replace_na, replace =0) }# Take a peak at the datahead(neighbors_summary)```As described in the main text, it can be advantageous to use total density that includes combined conspecific and heterospecific densities as a covariate, rather than heterospecific density.Here, we calculate total density by summing heterospecific and conspecific densities.```{r, message=FALSE}# First convert to long format which will make it easy to sum across # heterospecific and conspecific valuesneighbors_summary_long_format <- neighbors_summary %>%pivot_longer(cols =-c("i", "comparison_type", "treeID"))# Sum across heterospecific and conspecific values and rename to 'total'neighbors_total_long_format <- neighbors_summary_long_format %>%group_by(i, treeID, name) %>%summarize(value =sum(value)) %>%mutate(comparison_type ="total")# Bind together conspecific and 'total' densities# remove heterospecific columns# fill in 0s where there are no neighborsneighbors_summary_total =bind_rows(neighbors_summary_long_format, neighbors_total_long_format) %>%# Can filter out heterospecific neighborhood # values by uncommenting this line of the# objects become too large# filter(comparison_type != "het") %>% mutate(name =paste0(comparison_type, "_", name)) %>%select(-comparison_type) %>%pivot_wider(names_from = name, values_from = value, values_fill =0)```## Model mortality as a function of neighborhood densityTo determine the 'best' decay parameter to use, we fit species-specific mortality models using Generalized Additive Models [GAMs](https://noamross.github.io/gams-in-r-course/) and compare models based on log likelihoods. GAMs are flexible statistical models that combine linear and non-linear components to capture complex relationships in data.We first create our data set that we will use in the GAMs, subsetting down to just one census interval (because our example dataset only has 2 censuses) and removing trees close to the edge as above. In other datasets, you may have multiple census intervals, where it would be common practice to include 'year' or 'census' as a random effect in the model, but otherwise the overall approach is similar.```{r, message=FALSE}# Join census data with neighborhood datadat_gam <-left_join(dat_first_census, neighbors_summary_total,by ="treeID")# Remove edge trees (these will have NAs for densities calcualations)dat_gam <- dat_gam %>%filter(edge ==FALSE)```For each species, we summarize data availability to help determine parameters for the GAM smooth terms.```{r}# Summarize data availability at species level to set the degree of # smoothness for GAM smooth termssp_summary <- dat_gam %>%group_by(sp) %>%summarise(ndead =sum(mort_next),range_con_BA =max(con_nodecay_BA) -min(con_nodecay_BA),max_con_BA =max(con_nodecay_BA),unique_con_BA =length(unique(con_nodecay_BA)),unique_total_BA =length(unique(total_nodecay_BA)),range_con_N =max(con_nodecay_N) -min(con_nodecay_N),max_con_N =max(con_nodecay_N),unique_con_N =length(unique(con_nodecay_N)),unique_total_N =length(unique(total_nodecay_N)),unique_dbh =length(unique(dbh)) )# Filter out species that have 0% mortalitysp_summary <- sp_summary %>%filter(ndead >0)```In this long block of code, we loop over all possible combinations of decay values for different metrics of neighborhood densities weighted by distance for each species and fit a separate GAM for each model. For each GAM, we assess whether the model was able to be fit, and when it was able to be fit, whether it converged or produced warnings. We save the results of successful model fits into a list that we will process later.For large datasets where individual GAMs take a long time to run, the code could be modified to run in parallel, either locally on a personal computer or across a computing cluster.```{r, message=FALSE, warning=FALSE}# Initialize list that will save model outputsres_mod <-list()# Model run settingsrun_settings <-expand_grid(species =unique(sp_summary$sp),decay_con =c("nodecay", decay_names),decay_total =c("nodecay", decay_names),nhood_data_type =c("N", "BA"))# Loop through model run settingsfor(run_settings_row in1:nrow(run_settings)){# Extract values from run settings dataframe species <- run_settings$species[run_settings_row] decay_con <- run_settings$decay_con[run_settings_row] decay_total <- run_settings$decay_total[run_settings_row] nhood_data_type <- run_settings$nhood_data_type[run_settings_row]# Subset down to just focal species dat_subset <- dat_gam %>%filter(sp == species)# Set run name run_name <-paste0(species, "_total", decay_total,"_con", decay_con, "_", nhood_data_type)# Print status if desired# cat("Working on run: ", run_name, " ...\n")# Create model formula# Initial DBH included as predictor variable form =paste0("mort_next ~ s(dbh, k = k1) + s(total_", decay_total, "_", nhood_data_type, ", k = k2) + s(con_", decay_con, "_", nhood_data_type, ", k = k3)")# Convert into formula form <-as.formula(form)# Choose penalties for model fitting# set to default 10 (the same as -1)# The higher the value of k, the more flexible the smooth term becomes, allowing for more intricate and wiggly patterns. Conversely, lower values of k result in smoother and simpler representations. k1 = k2 = k3 =10if (k1 > sp_summary$unique_dbh[sp_summary$sp == species]) { k1 = sp_summary$unique_dbh[sp_summary$sp == species] -2 }if (k2 > sp_summary$unique_total_N[sp_summary$sp == species]) { k2 = sp_summary$unique_total_N [sp_summary$sp == species]-2 }if (k3 > sp_summary$unique_con_N[sp_summary$sp == species]) { k3 = sp_summary$unique_con_N[sp_summary$sp == species] -2 }# Fit model# wrap in a try function to catch any errors mod =try(gam(form,family =binomial(link=cloglog),offset =log(interval),data = dat_subset,method ="REML"), silent = T)# Check if model was able to fitif (!any(class(mod) =="gam")) {# print(paste("gam failed for:", run_name)) } else {# Check if gam convergedif (!mod$converged) {# print(paste("no convergence for:", run_name)) } else {# check for complete separation# https://stats.stackexchange.com/questions/336424/issue-with-complete-separation-in-logistic-regression-in-r# Explore warning "glm.fit: fitted probabilities numerically 0 # or 1 occurred" eps <-10* .Machine$double.eps glm0.resids <-augment(x = mod) %>%mutate(p =1/ (1+exp(-.fitted)),warning = p >1-eps,influence =order(.hat, decreasing = T)) infl_limit =round(nrow(glm0.resids)/10, 0)# check if none of the warnings is among the 10% most # influential observations, than it is okay.. num =any(glm0.resids$warning & glm0.resids$influence < infl_limit)# complete separationif (num) {# print(paste("complete separation is likely for:", run_name)) } else {# missing Vcif (is.null(mod$Vc)) {# print(paste("Vc not available for:", run_name)) } else {# Add resulting model to list if it passes all checks res_mod[[run_name]] <- mod } # Vc ifelse } # complete separation ifelse } # convergence ifelse } # model available ifelse} # end run settings loop```## Summarize model fitsNext we will extract summaries for each model with broom::glance() that provides key information like degrees of freedom, log likelihood, AIC, etc.```{r, results='asis'}# Extract summaries for each model into a listsums =lapply(res_mod, broom::glance) # Add a column for model run to each object in the listsums =Map(cbind, sums, run_name =names(sums)) sums =do.call(rbind, sums) # Bind elements of list together by rowsrownames(sums) <-NULL# Remove row names# Separate run name into columns for species, decay, and density typesums <- sums %>%separate(run_name, into =c("sp", "decay_total", "decay_con", "density_type"), remove =FALSE)# Remove 'total' and 'con' from decay columnssums$decay_total <-gsub("total", "", sums$decay_total)sums$decay_con <-gsub("con", "", sums$decay_con)# Rearrange columns sums <- sums %>%select(run_name, sp, decay_total, decay_con, density_type, everything()) # Take a look at the model summarieshead(sums)```Due to limited sample sizes, it is likely that GAMs will not fit for each species. For example, in the table below of summary data for species where models did not successfully fit/converge, there are several instances where all individuals of the species survived during the census, leading to no mortality events to use in the model.\\We need to exclude species without complete model runs from our overall calculations when looking for optimal decay parameters across the entire data set.```{r}# Tally up number of model runs by species and total decay valuestable(sums$sp, sums$decay_total)# get incomplete run-site-species combinationsrun_counts_by_sp <- sums %>%group_by(sp) %>%tally() %>%# Join with overall species listleft_join(sp_summary %>%select(sp), ., by ="sp") # Get expected number of runs if all models workexpected_total_runs <- run_settings %>%group_by(species) %>%tally() %>%pull(n) %>%max()# Save species names where they didn't have all expected combinations # of model runsincomplete = run_counts_by_sp$sp[run_counts_by_sp$n < expected_total_runs |is.na(run_counts_by_sp$n)]# Species with successful runshead(sp_summary[!sp_summary$sp %in% incomplete, ])# Species without successful runshead(sp_summary[sp_summary$sp %in% incomplete, ])```## Selecting optimum decay parameter values across all speciesWe then summarize different model criteria across all species runs. To look for the optimal value for decay parameters, we sum log likelihoods across all species for a given decay parameter combination and choose the resulting parameter combination with the highest summed log likelihood.One crucial point is that each run of the grid search should be done with the same set of trees, even if smaller neighborhodoes not need a buffer of 30 m.```{r, message=FALSE}sums_total <- sums %>%filter(!sp %in% incomplete) %>%group_by(decay_total, decay_con, density_type) %>%summarise(nvalues =n(),sumlogLik =sum(logLik),meanlogLik =mean(logLik)) %>%arrange(decay_total, decay_con, density_type)sums_total```We then create a heatmap plot @fig-optim of summed log likelihoods for all fixed devay (mu) parameter combinations across all species, with the optimal parameter combination marked with an X```{r, fig.height = 15, fig.width=10}#| label: fig-optim#| fig-cap: "Heatmap of optimal values for decay constants"# Find optimum value separately for N and BA optimum <- sums_total %>%group_by(density_type) %>%slice_max(sumlogLik)# Plot heatmap of log likelihood valuesggplot(sums_total, aes(x = decay_total, y = decay_con, fill = sumlogLik)) +geom_tile(width =0.9, height =0.9, col ="black") +scale_fill_viridis_c() +# viridis color palettegeom_label(data = optimum, label ="X") +labs(x ="Decay total density", y ="Decay conspecific density", fill ="sumlogLik") +facet_wrap(~density_type, ncol =1) +theme_bw(12) +theme(axis.text.x =element_text(angle =90, vjust =0.5, hjust=1))```For this data set, the following are the optimal decay parameter values across all species separately for neighborhood density calculated by abundance (N) and by basal area (BA)```{r}optimum```## Selecting optimum decay parameter values separately for each speciesAlternatively, it may be useful to determine optimum decay parameters on a species by species basis. To do this, we find the maximum log liklihood for each species separately across decay parameter combination and choose the resulting parameter combination with the highest log likelihood.```{r, message=FALSE}sums_total_by_spp <- sums %>%filter(!sp %in% incomplete) %>%group_by(decay_total, decay_con, density_type, sp) %>%summarise(nvalues =n(),sumlogLik =sum(logLik),meanlogLik =mean(logLik)) %>%arrange(decay_total, decay_con, density_type)sums_total_by_spp```We create a heatmap plot @fig-optim-spp of log likelihoods for all fixed decay parameter combinations for each species separately, with the optimal parameter combination marked with an X. We only display the first three species here, because with data sets containing many species, it's hard to visualize all the species on one graph and you may wish to subdivide the plot further for visualization.```{r, fig.height = 15, fig.width=10}#| label: fig-optim-spp#| fig-cap: "Heatmap of optimal values for decay constants separately for each species" sums <- sums %>%filter(sp %in%c("cappfr", "cordbi", "des2pa")) %>%group_by(sp) %>%# Scale loglikihood by species to help with visualizationmutate(logLik_scaled =scale(logLik)) %>%ungroup()# Find optimum value separately for N and BA optimum_by_sp <- sums %>%group_by(sp, density_type) %>%slice_max(logLik_scaled, with_ties =FALSE)# Plot heatmap of log likelihood valuesggplot(sums, aes(x = decay_total, y = decay_con,fill = logLik_scaled)) +geom_tile(width =0.9, height =0.9, col ="black") +geom_label(data = optimum_by_sp, label ="X") +labs(x ="Decay total density", y ="Decay conspecific density", fill ="logLik") +facet_wrap(~sp + density_type, ncol =2, scales ="free") +scale_fill_viridis_c() +# viridis color palettetheme_bw(12) +theme(legend.position ="right") +labs(fill ="Log likelihood\n(scaled)") +theme(axis.text.x =element_text(angle =90, vjust =0.5, hjust=1))```For this data set, the following are the optimal decay parameter values for each species separately for neighborhood density calculated by abundance (N) and by basal area (BA):```{r}optimum_by_sp```